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Abstract 



We propose a new method to analyze seismic time series and estimate the arrival-times of seismic waves. 

Our approach combines two ingredients: the times series are first lifted into a high-dimensional space using 

^ time-delay embedding; the resulting phase space is then parametrized using a nonlinear method based on 

'""5 the eigenvectors of the graph Laplacian. We validate our approach using a dataset of seismic events that 

CN occurred in Idaho, Montana, Wyoming, and Utah, between 2005 and 2006. Our approach outperforms 

Cn methods based on singular-spectrum analysis, wavelet analysis, and STA/LTA. 
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1. Introduction 

.^H l-l- Estimation of arrival-times 

^^ Seismic waves come in distinct bursts, or arrivals, corresponding to different propagation paths through 

r^ the earth. Arrival-times of seismic waves are indispensable to the determination of the location and type 

Q-i of the seismic event; the precise estimation of arrival-times remains therefore a fundamental problem. This 

paper addresses the problem of estimating arrival-times of local seismic waves from a seismogram. Several 

^S| methods for estimating arrival-times use some variants of the classic current-value-to-predicted-value ratio 

K^ method (e.g. [1, 2, ?>] and references therein). The current value is a short term average (STA) of the energy 

^^ of the incoming data, while the predicted value is a long term average (LTA), so the ratio is expressed as 

, _■ STA/LTA. This ratio is constantly updated as new data flows in, and a detection is declared when the ratio 

/y-^ exceeds a threshold value. When the signal and the noise are Gaussian distributed, the STA/LTA method 

yields an optimal detector that strikes the optimal balance between Type I and Type II errors [4, "»]. As 

explained in [u], seismic waves are non-Gaussian, and therefore higher order statistics (such as skewness and 

f^ kurtosis) can be used to detect the onset of seismic waves [7, 8, 9]. The performance of a detector can be 

improved by enhancing the signal transients relative to the background noise. Several time-frequency and 

time-scale decompositions have been proposed for this purpose (e.g. [10, 11, 12], and references therein). 

Advanced statistical methods can use training data (in the form of seismograms labelled by an analyst). For 

instance, the software developed at the Prototype International Data Center (Arlington, VA) is based on 

2 a multi-layer neural network that uses labelled waveforms in order to predict the types of waves of unseen 

seismograms [l-'^)]. 
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Figure 1: For each time t we construct a patch x(t) composed of d equally-spaced samples of the seismogram. 



1.2. Local analysis of a time series: the concept of patch 

In order to detect the arrival of a seismic wave, and estimate its arrival-time, we propose to characterize 
the local dynamics of the seismogram using a sliding window, or temporal patch (see Fig. 1). Formally, let x{t) 
be one of the components (e.g. the z component, or the first eigenmode after a singular value decomposition 
of the three components) of a seismogram. Let At be the sampling period of the seismogram. The patch 
x{t) is formed by collecting d equally-spaced samples of x{t) and stacking them into a d-dimensional vector. 



x{t)=[x{t) x{t + At) ... x{t+{d-l)At)y 



(1) 



While we generally think of x{t) as a snippet of the original signal over the interval [t, t + dAt) (see Fig. 1), 
in this paper we will think about x(t) as a vector in d dimensions (see Fig. 2). We call patch-space the 
region of R'^ formed by the patch trajectory {x{t),t > 0} (see Fig. 2). Our goal is to detect the presence of 
a seismic wave in the patch x{t) from the location of the patch in patch-space. 

The concept of patches is equivalent to the concept of time-delay coordinates in the context of the analysis 
of a dynamical system from the time series generated by an observable [14, 15, !(>]. In this work, we offer 
a novel perspective on the concept of time-delay coordinates by combining several patch trajectories (from 
several seismograms) and computing a nonlinear parametrization of the combined phase spaces defined by 
the delay coordinates (1). This nonlinear parametrization assigns to each patch x{t) a small number of 
coordinates that uniquely characterize the position of the patch in patch space, while providing the optimal 
separation between baseline patches and arrival patches. 

1.3. Problem statement 

We are interested in detecting seismic waves and estimating the arrival-time of each wave. We model 
the seismogram x{t) as a sum of two components 



x{t) ^b{t) + w{t), 



(2) 



where w{t) represents a seismic wave arriving at time r, and b{t) represents the baseline (or background) 
activity. We assume that b{t) is a slowly varying signal with its energy uniformly distributed over time. In 
contrast, we expect that w(t) will be a fast oscillatory transient localized around the arrival-time r (see Fig. 
3). Our goal is to detect the seismic wave w(i), and estimate its onset r. The difficulty of the problem stems 
from the fact that there is a large variability in the shape and frequency content of the seismic waves w{t). 

We tackle this question by lifting the model (2) into M'' using the time-delay embedding defined by (1). 
As explained in Sec. 2.2, after time-delay embedding, baseline patches that do not overlap with the seismic 
wave w{t) and only contain the baseline signal b{t) become tightly clustered along low-dimensional curves. 
In contrast, arrival patches that include portions of the seismic wave wit) remain at a large distance from 
one another, and are also at a large distance from the baseline patches. The differential organization of the 
baseline and arrival patches in M'' is the first ingredient of our approach. 

The second ingredient of our approach is provided by a parametrization of patch-space that manages to 
cluster baseline patches and arrival patches into two separate groups. This parametrization allows us 
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Figure 2: The trajectory {x{t),t > 0} is a one-dimensional curve in K''. This curve revisits already seen regions of 1 
the curent patch x{t) is similar to a previously seen patch (e.g. the gold and blue patches). 
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to represent patches from M^, where d is of the order of 10"^ using only about 25 coordinates. Finally, the 
last stage of our approach consists in training a classifier to detect patches containing seismic waves. The 
classifier uses the low-dimensional parametrization of patch-space. 

In summary, the contribution of this paper is a novel method to analyze seismograms and estimate 
arrival-times of seismic waves. Our approach includes the following three steps: 

1. Construction of patch-space by time-delay embedding of seismograms; 

2. Low-dimensional parametrization of patch-space; 

3. Construction of a classifier using the low-dimensional parametrization; detection of the presence of 
seismic waves and estimation of arrival-times. 

1.4- Outline 

In the next section we explain how the properties of a waveform x{t) will manifest themselves in terms 
of geometric properties of the patch trajectory {x{t),t > 0}. In particular, we estimate mutual distances 
between patches and we describe the alignment of patches along low-dimensional subspaces. Our analysis 
is performed assuming time is continuous (in Sec. 3.2, we consider the discrete version of the problem). 
Furthermore, we assume that the set of patches is constructed from a single seismogram. In Sec. 3 we 
expand our discussion to the case where patches are extracted from several seismograms collected at different 
stations. In Sec. 3.3, we construct a low-dimensional parametrization of the discrete version of patch-space. 
We use this low-dimensional parametrization in Sec. 4 to build a classifier that learns to detect patches 
made up of seismic waves. Finally, the performance of our approach is quantified in Sec. 5. 



2. Patch- Space 

2.1. Patch trajectories are one-dimensional 

Given a seismogram x{t), the patch x{t) extracted at time i is a vector in M.'^. As t evolves, the set 
of points {x{t),t > 0} forms the patch trajectory. If x{t) is a smooth function of time (has s derivatives), 
then the patch trajectory is a smooth one-dimensional curve in M'^, and the dimension of the curve (one) is 
independent of the patch size d. Indeed, for fc = 0, . . . , d — 1 the maps 

t — > x(t + kAt) (3) 

are all smooth. Therefore each of the coordinates of x{t) is a smooth function of t, and the map 

t ^ x{t) (4) 

is a smooth map from M to W^ and thus is a one-dimensional curve in M'^. 



2.2. Mutual Distance Between Two Patches 

In the following, we assume that the seismogram is described by the model (2) and we study the Euclidean 
distance between any two patches x{ti) and x{tj) extracted at times ti and tj. We first consider the case 
where both patches come from the baseline part of the signal. In this case, we assume w{t) — over the 
intervals [ti,ti + dAt) and [tj,tj + dAt), and we have 



\x{t,) - x{tj)\\'^ ^ ^ {b{t, + kAt) - b{tj + kAt)f 



k=0 



If the baseline signal varies slowly, then we have \b{ti + kAt) — b{tj + kAt)\ w 0, (fc = 0, . . . ,(i — 1), and 
therefore 

d-l 

\\x{ti) - x{tj)\\^ = Y^ (b(t, + kAt) - b{t.j + kAt))^ w 0. 

fc=0 

We now consider the case where one patch, x{ti) (without loss of generality), is part of a seismic wave w{t), 
whereas the other patch, x{tj), comes from the baseline part. We have 



x{ti) ~ x{tj)\\'^ = Y{w{ti + kAt)+b{t, + kAt)~b{tj + kAt))'^ 

k=0 

d-1 d-1 

= ^ w'^{t, + kAt) + 2 ^ w{t, + kAt) {b{t, + kAt) - b{tj + kAt)) 

k=0 fc=0 

d-1 

+ ^ {b{ti + kAt) - b{tj + kAt)f 



[U[ii -\- KIAL) — U[ij -f- KlAi)'' 
k=0 

Again, we can assume that for each k, \b{ti + kAt) — b{tj + kAt)\ w 0, and thus 

d-1 



fe=0 
^d-l 



^ w{t, + kAt) {b{ti + kAt) - b{tj + kAt)) sa 0. 

fe=0 

As before, we have J2k=o i^i^i + kAt) — b{tj + kAt)) w . We conclude that 

d-1 

\\x{ti) ~ x{t,)\\^ = Y, w^{U + kAt). (5) 



k=0 



This sum measures the energy of the (sampled) seismic wave over the interval \ti , ti + dAt) . Because the 
patch size, dAt , is chosen so that u'(i) oscillates several times over the patch (see Fig. 3), the interval 
[ti, ti + dAt) is comprised of several wavelengths of w{t), and the energy (5) is usually large. 

Finally, we consider the case where both patches contain part of the seismic wave w{t) (see Fig. 3), 



d-1 d-1 



\\x{ti) - x{tj)f = Y (HU + kAt) - b{tj + kAt)f + Y (^(^» + kAt) - w{tj + kAt)y 

k=0 fc=0 

d-1 

+ 2 ^ (w(t, + kAt) - w{tj + kAt)) {b{t, + kAt) - b{tj + kAt)) . 



fe=0 

If we assume that the baseline signal varies slowly over time, then we have again 

d-1 

\\x{t^) - x{tj)\\'^ w Y (■^(^^ + kAt) - w{tj + kAt))^ . (6) 

fc=0 
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Figure 3: The two patches (in red and in blue) contain part of the seismic wave ui(i). 

The sum (6) measures the energy of the difference between two overlapping sections of the seismic wave 
w(t), sampled every At (see Fig. 3). In order to estimate the size of this energy, we approximate the seismic 
wave with a cosine function (see Fig. 3), 'w{t) = cos{ujt), where the frequency w corresponds to the peak of 
the short-time Fourier transform of w{t) around r. In this case, we have 



w{U + kAt) - w{tj + kAt) 



cos {uj{U + kAt)} - cos {uj{tj + kAt)} 

2sm {uj{tj - ti)/2} sm{uj [(t, + t^)/2 + kAt]} , 



and the sum (6) becomes 



d-l d-1 

^ {w{t, + kAt) - w{tj + kAt)}^ = 4sin^ {uj{tj - t,)/2} Y] sin^ {^ [{U + tj)/2 + kAt]} . 

k=0 



k=0 



(7) 



The sum in the right-hand side of (7) can be written as 

d-l d-l 

Y^ sin^ {w [(t, + tj)/2 + kAt]} = ^ cos^ L 



fc=0 



k=0 



(t, + tj)/2+^+fcAi]} 



(8) 



The right-hand side of (8) is the energy of the cosine function sampled every At on an interval starting at 
{ti +tj)/2 + TT/2uj of length dAt. As explained above dAt ^ 2tt/uj, and thus the energy (8) is measured over 
several wavelengths and is therefore large. Finally, given a patch starting at time ti, all patches starting at 
time tj, where tj = ti + {q + l/2)27r/w, for some q = 0,1, . . . will satisfy sin^ (w(tj — ti)/2) = 1. We conclude 
that given a patch starting at time ti, there are many choices of tj such that the sum in (7), and therefore 
the norm in (6), are large. 

In summary, we expect the mutual distance between patches extracted from the baseline signal to be 
small, while the mutual distance between baseline and arrival patches will be large. Moreover, we also expect 
that two arrival patches will often be at a large distance of one another. 

2.3. Global Alignment of the Patches 

We have seen that, given a seismogram x{t), the patch trajectory {x{t),t > 0} is a one-dimensional curve 
in M''. We could imagine that this curve would be wildly scattered all over R'^, exploring every part of the 
space. In fact, this is not the case: if the seismogram has bounded derivatives up to order s, then the patch 
trajectory will lie near the intersection of s hyperplanes. In other words, the one-dimensional curve does 
not explore the entire space, but stays inside a subspace of dimension d — s 

We first observe that we can compute numerical estimates of the first s < d — 1 derivatives from the 
d coordinates of the patch x{t). We can therefore quantify the local regularity of the function x{t) over 
the time window [t,t + dAt) from the knowledge of x{t). This observation was at the origin of the first 
embedding theorems [ ] that did not use the notion of time-delay coordinates - which is equivalent to our 




Figure 4; The patch trajectories xi{t), 332 (t) and 0:3 (t) belong to a common low-dimensional submanifold of the d— 2-dimensional 
unit sphere in R'*"-'^. 



notion of patch - but involved the differential coordinates (Px/dtP ,p = 0, . . . , d — f . We consider a forward 
finite difference scheme that approximates the derivative of order p (1 < p < s — 1) at time t, 



dPx 
IttP 



1 



^ajx{t + jM). 



(9) 



i=o 



Obviously, more accurate schemes can be obtained using central differences; we use a forward scheme here 
to simplify the discussion. The finite difference (9) is a linear combination of the first p + 1 coordinates of 
the patch x{t). We assume that (Px/dt^ is bounded by a small constant over an interval U . Consequently, 
the finite difference (9) is small, and we have 






ie U. 



This last statement can be translated as follows: the distance of the patches {x{t),t G C/} to the hyperplane 
of W^ defined by the vector 



[ao ai 







o]-e 



(10) 



is small. We conclude that if the function x{t) has s bounded derivatives for t G U, then the trajectory of 
x{t), for t G U, lies near the intersection of s hyperplanes, each of which is defined by a set of coefficients of 
the form (10). 

2.4- Normalization of Patch-Space 

We now consider the following question: if we want to use seismograms from different stations to learn 
the general shape of a seismic wave (see e.g. [18]), how should we normalize the seismograms? The magni- 
tude of an earthquake, which characterizes its damaging effect, is defined as a logarithmic function of the 
radiated energy [ ]. The radiated energy can be estimated by integrating the velocity associated with the 
displacement measured by seismograms [19]. A logarithmic normalization [19] would make it possible to 
account for the large variability in the energy and would allow us to compare seismograms from different 
stations or from different events. We favor an equivalent normalization that consists in rescaling each patch 
by it energy. For every patch x{t), we first remove any slowly varying drift by removing the mean of x(t) 

over the interval [t, t + dAt), x{t) — ( X]fc=o ^(^ + kAt) J /d, and we compute the centered patch 

Xo{t) ^ [x{t) - x{t) ... x{t+{d-l)At)-x{t)]'^ . 
Finally, we project the centered patch Xo{t) on the unit sphere and define the normalized patch 

Xo{t) 



\Mt)\\ 



(11) 



The normalized patch (11) characterizes the local oscillation of x(t) in a manner that is independent of 
changes in amplitude and of any slow drift of the seismogram. Geometrically, the trajectory of the normalized 
patch is a curve on the d — 2 dimensional unit sphere in M''"^ (see Fig. 4). Indeed, after subtracting the 
mean x{t), the patch lies on the hyperplane of K'^ defined by J2i=i ^i — 0- After the normalization (11), the 
normalized patch lives at the intersection of the unit sphere in M'*, and the hyperplane X]i=i ^« = 0. This 
intersection is also a unit sphere, but in M'*^^. In the remaining of the paper we assume that each patch 
x(t) has been normalized according to (11). 

2.5. The Embedding Dimension 

The patch size is determined by the number, d, of time-delay coordinates. The selection of d can be 
guided by several algorithms that have been proposed in the context of the analysis of nonlinear dynamical 
systems from time series [20, 21, 22]. In practice, if d is chosen too large, then our understanding of 
the geometry of patch-space becomes restricted by the number of patches available. Indeed, the number of 
patches is limited by the number of time samples; but we need a number of patches that grows exponentially 
with the dimension, d ,of patch-space to properly estimate the distribution of the patches [23]. 

3. A new parametrization of patch-space 

3.1. From a single seismogram to several seismograms 

Because we learn to detect arrivals using more than a single seismic trace (as in ] ], for instance) we 
need to understand the structure of patch space when patches come from different seismograms acquired at 
different stations. After normalization, all the patches live on the d — 2-dimensional unit sphere in M'^"^, and 
are therefore characterized by d — 2 coordinates. Each seismogram gives rise to a one-dimensional trajectory 
on the unit sphere (see Fig. 4). We expect that the patch trajectories created by different seismograms will 
remain close to one another, and will not be spread across the unit sphere. Our experiments confirm that 
the trajectories belong to a m-dimensional submanifold embedded in the d — 2-dimensional unit sphere, 
where m <^ d ~ 2. 

3.2. From continuous to discrete patch-space 

In practice, each seismogram is sampled with the sampling period Ai, and we obtain a time series 

Xi = x{ti), where ti = iAt. 

The uniform sampling (in time) of the seismogram leads to a non uniform sampling (in space) of the one- 
dimensional trajectory x{t). We define the discrete patch by 

Xi = [xi Xi+i . . . Xi+(d-i)] ■ (12) 

In order to identify patches that contain seismic waves, we want to characterize the m-dimensional subman- 
ifold of patch trajectories. Formally, we seek a smooth parametrization of the set of patch trajectories that 
uses the minimum number of parameters - theoretically only m. An answer to this question is provided 
by the computation of the principal components of the set of patches {xi, i = 0, . . .}. This method, known 
as singular-spectrum analysis [24, 25[, has been used to characterize geophysical time series [2G, 27[. Un- 
fortunately, unless the patches lie along a low-dimensional linear subspace, many (typically more than m) 
principal components will be required to capture the curvature and torsion of the patch trajectories. This 
problem is quite severe since the part of patch-space that corresponds to arrivals exhibits high curvature. 
Our quest for a parametrization of the submanifold of patch trajectories is similar to the problem of re- 
constructing a phase space based on time-delay coordinates. Indeed, we can consider that seismograms are 
observables from the nonlinear dynamical system that is at the origin of the seismic waves. The authors in 
[2^[ and ['"[ show that patches extracted from volcanic tremors evolve around a low-dimensional attract- 
ing manifold (see also [SO, '.il, 32, 33[) The embedding dimension of the phase space reconstructed from 
Strombolian tremors was estimated in [ [ to be around five (see also [ [ and [ [). Our approach provides 




Figure 5: The patch graph: each node is a patch; nodes are connected if patches are similar. 



a new perspective on this question by directly constructing a smooth parametrization of the set of phase 
spaces associated with the different seisniogranis. Our hypothesis, confirmed by our experiments, is that 
the vectors Xi Hve close to a low-dimensional submanifold of the unit sphere in 



pd-l 



S.3. Nonlinear Parametrization of Patch-Space via the Eigenvectors of the Laplacian 
3.3.1. The patch graph: a network of patches 

Our plan is to assemble a global parametrization ^ of the submanifold of patches from the knowledge 
of the pairwise distances between patches. In principle, we should measure the geodesic distances between 
patches along the underlying submanifold. Unfortunately, the only distances available to us are Euclidean 
distances. This issue can be resolved by observing that the Euclidean and the geodesic distances between 
two patches are very similar if the patches are in close proximity. We therefore use only distances between 
neighboring patches, and disregard distances between faraway patches. 

We describe patch-space with a graph G that is constructed as follows. We assume that we have access 
to N patches Xi,i = 1, . . . ,iV extracted from several seismograms recorded at different stations. All the 
patches have been properly normalized, as explained in Sec. 2.4. Each patch Xi becomes the vertex Xi of 
the graph^. Edges between vertices quantify the proximity between patches. Each vertex Xi is connected to 



its V]siN nearest neighbors according to the Euclidean distance ||: 
connected by an edge {i,j} we write Xi ^ Xj. The weight Wij 
between the patches Xi and Xj and is defined by 



^i — Xj\\. When the vertices Xi and Xj are 
on the edge {i,j} measures the similarity 



W^.j = 



Va^ 



if Xi is connected to Xj 
otherwise. 



(13) 



The scaling factor a modulates our definition of proximity. If cr 3> maxij- ||a;i — cCjH, then for all edges {i, j} 
we have Wij ~ 1. This choice of a blurs the distinction between baseline and arrival patches by pretending 
that all patches are similar to one another {Wij « 1). On the other hand, if cr « 0, then Wij « for 
all edges {i,j} such that \\xi — Xj\\ > 0. Only if \\xi — Xj\\ « do we have Wi,j ^ 0. This choice of a 
accentuates the differences between patches by pretending that all patches are different the minute they are 
slightly different. Obviously, this choice of a is very sensitive to any noise existing in the data. The weighted 
graph G is fully characterized by the N x N weight matrix W with entries Wij. We also define the diagonal 
degree matrix D with entries Dii = ^ W^ij- 

3.3.2. We trust only local distances between patches 

The construction of the parametrization of patch-space is guided by the following two principles: 

• Distances between patches connected by an edge are small and one can approximate their geodesic 
distance by their Euclidean distance. This allows one to measure local geodesic distance without an 
existing knowledge of the underlying submanifold of patch trajectories. 



^We shghtly abuse notation here: Xi is patch as well as a vertex on the graph. 
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• The new parametrization of patch-space assembles the different constraints provided by the local 
geodesic distances into a set of global coordinates that vary smoothly over the submanifold. 

Only an isometry will preserve exactly Euclidean distances between patches, and the isometry which is 
optimal for dimension reduction is given by PCA. However, as shown in the experiments, the coordinates 
provided by PCA are unable to capture the nonlinear structures formed by patch-space. We propose therefore 
to seek a sequence of functions i/ji, ■02, . . . that will become the new coordinates of each patch. 

3.3.3. The new coordinate functions: the eigenfunctions of the Laplacian 

We define each coordinate function ^pk as the solution to the following minimization problem 

™;^^ v^t; — 72r~\ ' (14 

V-fc zZtDi^iil^iixi) 

where ip^ is orthogonal to the previous functions {i/'o, tj^i, ■ ■ ■ , V'fc-i}i 

N 

(V'fc,-0j)=^A,z'0fc(a;OV'j(a;»)=O (j = 0,...,fc-l). (15) 

4 = 1 

The numerator of the Rayleigh ratio (14) is a weighted sum of the gradient of ipk measured along the edges 
{i, j} of the graph; it quantifies the average local distortion introduced by the map xpk- The distortion is 
measured locally: if x^ and Xj are far apart, then Wij « 0, and the difference {ipk{xi) — xjjk{xj)) does 
not contribute to the sum (14). The denominator provides a natural normalization. The constraint of 
orthogonality (15) to the previous coordinate functions guarantees that the coordinates i/jo, ■0i, . . . describe 
the dataset with several resolutions: if {ipkjipj) = then ipk experiences more oscillations on the dataset 
than the previous ■0^ . Intuitively, ipk plays the role of an additional digit that describes the location of Xi 
with more precision. It turns out [^)•"l] that the solution of (14,15) is the solution to the generalized eigenvalue 
problem, 

iD-W)iPk = XkDiPk, fc = 0,l,... (16) 

The first eigenvector -00, associated with Aq — 0, is constant, ■^^{xi) = l,i = 1, . . . ,N; it is therefore not 
used. Finally, the new parametrization ^ is defined by 

Xi^'^iXi) ^ [lpl{Xi) 02(33.0 ... IprniXi)] . (17) 

The matrix P = D ^W is a row-stochastic matrix, associated with a Markov chain on the graph, and the 
matrix L — I—P that appears in (16) is known as the graph Laplacian. The idea of parametrizing a manifold 
using the eigenfunctions of the Laplacian can be traced back to ideas in spectral geometry [36], and has been 
developed extensively by several groups recently \.M, .38, 39, 40]. The construction of the parametrization 
is summarized in Fig. 6 (see also ] ] for a version that can be implemented with fast algorithms). Unlike 
PCA, which yields a set of vectors on which to project each Xi, this nonlinear parametrization constructs 
the new coordinates of Xi by concatenating the values of the xj^k, k — 1, ■ ■ ■ ,m evaluated at Xi, as defined 
in (17). 

3.3.4. How many new coordinates do we need? 

Our goal is to construct the most parsimonious parametrization of patch-space with the smallest number 
m of coordinates. We expect that if m is too small, then the new parametrization will not describe the data 
with enough precision, and the detection of seismic waves will be inaccurate. On the other hand, if m is too 
large, then some coordinates will be mainly describing the noise in the dataset (and not adding additional 
information) , and the classification algorithm will overfit the training data. The experiments confirmed that 
TO = 25 yields the optimal detection of seismic waves - and performed better than m — 50 - even when 
as many as d = 1024 time samples are included in each patch. Clearly, this approach results in a very 
significant reduction of dimensionality. 



Algorithm 1: Parametrization of patch-space 



Inpig^^ of TV patches, Xi,i = 1,- ■ ■ ,N; m: dimension of the embedding 

a: width of the kernel for the graph; vnn'- number of neighbors of each patch; 

Algorithm: 

1. construct the graph defined by the i^^v/v nearest neighbors of each x^ 

2. compute the weight matrix W. and the degree matrix D 

3. compute the m eigenvectors -01, ... , ip„i of D^sWD^ i 

Output: m coordinates '4' (aji) = [ipiixi) ip2{xi) ... ipmixi)^ . 

Figure 6: Construction of the embedding 

3.4- The case of a single station and a single event: a dynamical system connection 

We consider in this section the simpler situation where patch space is constructed using only seismograms 
recorded at a single station from a single event. We identify patch space with the phase space, reconstructed 
by time-delay embedding, of the dynamical system associated with the earthquake. In the following, we 
explore the connections between the graph Laplacian defined on phase space and several methods that have 
been proposed to characterize a dynamical system. 

3.4.1. Recurrence quantification analysis 

We first observe that the weight matrix W defined by (13) is closely related to the recurrence plot matrix 
associated with a dynamical system [42] . The recurrence plot matrix R is defined by 



R 



1 if \\xi — XjW < 5, 



1 otherwise. 

We can interpret the recurrence plot matrix R in terms of the weight matrix of a graph defined as follows: 
each vertex Xi is connected to the vertices that are within a distance 5 oi Xi, and the weights along the 
edges are equal to one. This graph is similar to the vpfj^-neaiest neighbor graph used in this work when S 
is chosen so that on average each vertex is connected to vnn vertices, and cr = 00 in (13). Several methods 
have been proposed to recover information about x{t) from the knowledge of R [43, 44]. In this paper, we 
demonstrate that we can indeed construct a smooth parametrization of the phase space {x(t),t > 0} from 
the knowledge of the weight matrix W only. 

3.4.2. Complex networks in phase space 

The recurrence plot matrix R can also be interpreted as the adjacency matrix of a graph that captures 
the dynamical system. Several authors have proposed to characterize nonlinear dynamical systems by 
studying such graphs, known as "recurrence networks" [45, 46, 47] (see also [48, 4!J] for graphs that are 
constructed using the i^nn nearest neighbors in phase space, as we do). After the graph is constructed, 
geometric properties of the graph, such as diameter, path lengths distribution, are computed (see ] '"] and 
references therein for a detailed review). Because such geometric properties can be computed directly from 
the eigenvalues Xk defined by (16) [35], the methods proposed in [45, 46, 47, 48, 49] are in fact equivalent to 
the problem of studying the Laplacian defined on the network. Our approach can therefore be understood 
as a spectral characterization of phase space obtained from the eigenfunctions and the eigenvalues of the 
Laplacian defined on a recurrence network. 
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3.4-3. Frohenius- Perron Operator 

Finally, one can define a Frobenius-Perron operator, similar to the matrix i-*, that characterizes the 
evolution of the probability distribution of the configurations of phase space [oO]. The eigenfunctions of the 
Frobenius-Perron operator can be used to partition phase space into almost invariant subsets [51, 52, 53]: a 
trajectory initiated in each of these subsets remains in the subset for a long time before escaping to another 
almost invariant subset ['lO]. The eigenfunctions of the Perron- Frobenius operator are exactly similar to 
the ij^k- In summary, in the simpler case of the analysis of a single earthquake from a single station, our 
approach can be interpreted as a method to decompose phase space in terms of regions where the long term 
dynamics are similar ['>'■'>]. 

4. Estimation of Arrival-Times of Seismic Waves 

4-.1. Learning the presence of seismic waves in patch-space 

Our goal is to learn the association between the presence of a seismic wave within a patch, and the values 
of the patch coordinates. As explained before, we advocate a geometric approach: we expect that patches 
will organize themselves on the unit sphere in M'^"^ in a manner that will reveal the presence of seismic 
waves. We represent all the patches with the coordinates defined by ^ in (17). We then use training data 
(labelled by experts) to partially populate patch-space with information about the presence or absence of 
seismic waves. We combine the information provided by the labels with the knowledge about the geometry 
of patch-space to train a classifier; this approach is known as semi-supervised [54]. We then use the classifier 
to classify unlabelled patches into baseline, or arrivals patches. The classification problem is formulated as 
a kernel ridge regression problem [55]: for any given patch, the classifier returns a number between and 1 
that quantifies the probability that a seismic wave be present within the patch. 

We assume that Ni of the A^ patches have been labelled by an expert (analyst): for each of these patches 
we know if a seismic wave was observed in the patch, and at what time. We construct a response function 
/ defined on the new coordinates, ^{xi) £ M."\ and taking values in [0, 1], 

/:M™^[0,1] (18) 

^{x,) -^fi^Sfix,)). (19) 

The classifier decides that the patch Xi contains an arrival if the response /(^(x^)) is greater than some 
threshold £ > 0. The threshold e controls the rates of false alarms and missed detections: a small e results 
in many false alarms but will rarely miss arrivals, and vice versa. We expand the response function as a 
linear combination of Gaussian kernels in M'", 

Ni 

fi^ix)) = ^/3, exp {-||*(a;) - ^{x,)\\ya'} . (20) 

The vector of unknown coefficients /3 = [/3i, . . . , /^at,] is computed using the training data. The kernel ridge 
regression [55] combines two ideas: distances between patches are measured using the Gaussian kernel K, 
with entries K^j = exp { — 11^(0;^) — '4'(a;i)|p/a^}, i,j = 1, . . . ,iV;; and the classifier is designed to provide 
the simplest model of the response in terms of the Ni training data. Rather than trying to find the optimal 
fit of the function / to the Ni labelled patches, we penalize the regression (20) by imposing a penalty on the 
norm of (3 [55]. This prevents the model (20) from overfitting the training samples. The optimal regression 
is defined as the solution to the quadratic minimization problem 

\\r-K(3\\^+^i(3'^K(3, (21) 

where r ~ [ri ... tatJ is the known response on the Ni labelled patches. The parameter /i controls the 
amount of penalization: /i = yields a least squares fit, while /Lt = oo ignores the data. For a given choice 
of /I, the optimal vector of coefficients [55] is given by 

(3={K + ^iI)-'r, (22) 
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Figure 7; Seismic traces Xi (blue); estimated responses r^ (red); arrival-times tj (black vertical bars). 



where I is the Ni x Ni identity matrix. In our experiments, the ridge parameter fi was determined by 
cross-vahdation, and the same value, /i — 0.8, was used throughout. The Gaussian width a is chosen to be 
a multiple of the average kernel distance. 



a'^C (average,__,Jl*(a;,) - *(a;,)|p) 
for all experiments we chose C = 0.51. 



(23) 



4-.2. Defining Ground truth 

4-. 2.1. Uncertainty in arrival-time 

In order to validate our approach we need to compare the output of the response function /, defined 
by (20), to the actual decision provided by an expert (analyst). This comparison needs to be performed 
for every patch being analyzed. Unfortunately, the decision of the analyst is usually formulated as a binary 
response: an arrival is present at time t^ or not. We claim that this apparent perfect determination of the 
arrival-time is misleading. Indeed, as was pointed out in [ : ], the origin time and the arrival-time at a given 
station are, for all practical purposes, random variables whose distributions depend on quality of the seismic 
record and the training and experience of the analyst detecting the arrivals. We formalize this intuition and 
model the arrival-time estimated by the analyst as a Gaussian distribution with mean tj and variance hj. 
The parameter hj controls the width of the Gaussian and quantifies the confidence with which the analyst 
estimated tj . Ideally, hj should be a function of the inter-observer variability for the estimation of tj . In 
this work, we propose to estimate the uncertainty hj directly from the seismogram. For each arrival-time tj, 
we compute the dominant frequency io of x(t) using a short Fourier transform. Let T = 1/lu be the period 
associated with w, we define the uncertainty hj as follows 



"^ = \h 



J2T/At ii2T/At<h„ 
otherwise. 



This choice of hj corresponds to the following idea: if the seismogram were to be a pure sinusoidal function 
oscillating at the frequency w (see Fig 3), then this choice of hj would guarantee that we observe two periods 
(cycles) of x{t) over a time interval of length hj. 

Finally, we define the true response r^ at time ti to be the maximum of the Gaussian bumps associated 
with the arrival-time times Tj nearest to time ti, 



ri — max 
j 



{exp(-(i, -Tj)V/ij)}. 



(24) 



Figure 7 displays two seismograms with different values for hj. In the top seismogram the first arrival is 
very localized (small hi), whereas the second arrival corresponds to a lower instantaneous frequency, and 
is therefore less localized (large /12). In the second seismogram (bottom of Fig. 7) the first two arrivals are 
very close to one another resulting in an overlap of the Gaussians defining the response r^. 
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Figure 8: Raw and filtered seismic traces with associated STA/LTA outputs; high energy locahzation (A) and very diffuse 
energy locahzation (B). Analyst picks are represented by bars. 



4-2.2. Energy localization of seismic traces 

Because we analyze seismic traces of very different quality, we need to define a concept of energy local- 
ization associated with a seismic trace. Indeed, variability in the estimation of arrival-times in human is 
less pronounced when a seismic trace contains very localized arrivals [')7]. We plan therefore to evaluate the 
performance of our method as a function of the energy localization level. For this purpose, we define the 
energy localization of a given trace to be the ratio of the energy of the seismic waves over the energy of the 
baseline activity. This ratio can be defined in terms of the set of patches that are extracted from the trace. 
For a given trace, let A be the subset of patches that contain arrivals, and B the complement of A: the 
subset of patches that contain only baseline activity. We define the energy localization by the ratio 




(25) 



where \A\ (resp. \B\) is the cardinality of A (resp. B). Figure 8 shows two seismic traces with very 
different energy localizations. Arrival-times assigned by the analyst are represented by vertical bars. The 
short window used to compute the STA is three second long (120 samples) and the long window, which 
immediately follows the short window, is 27 second long (1080 samples). Before computing the STA/LTA 
ratio, we apply a standard preprocessing step and filter the raw seismic trace with a passband ([0.8, 3.5] Hz) 
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Figure 9: Locations of the stations and events from the Rocky Mountain region. 

Butterworth filter (see Fig. 8). For tiie purpose of visual comparison, we normalized the STA/LTA output 
so that its maximum value is one. The trace (A) in Fig. 8 has a large energy localization, while the trace 
(B) has a very low energy llocalization. The Butterworth filter is able to remove some of the irrelevant 
low-frequency oscillations in (B) and yields a signal that can be processed by STA/LTA. We note that the 
second arrival (Lg) in the first trace (A) is missed by the STA/LTA algorithm. 



5. Results 

5.1. Rocky Mountain Dataset 

We validate our approach with a dataset composed of broadband seismic traces from seismic events that 
occurred in Idaho, Montana, Wyoming, and Utah, between 2005 and 2006 (see Fig. 9). Arrival-times have 
been determined by an analyst. The ten events with the largest number of arrivals were selected for analysis. 
In total, we used 84 different station records from ten different events containing 226 labelled arrivals. Of 
the 226 labelled arrivals, there are 72 Pn arrivals, 70 Pg arrivals, 6 Sn arrivals, and 78 Lg arrivals. The 
sampling rate was 1/At = 40 Hz. We consider only the vertical channel in our analysis. To minimize the 
computational cost, patches are spaced apart by 40At (one second). 

5.2. Validation of the Classifier 

The performance of the algorithm varies as a function of the energy localization S, and therefore we 
perform three independent validations by dividing the seismic traces into three homogeneous subsets:ni — 27 
traces with low energy localization (S' < 3), 712 = 29 traces with medium energy localization (3 < S* < 18), 
and the remaining n^ — 28 traces with high energy localization { S > 18). For each subset s, (s — 1, 2, 3), we 
perform a standard leave-one-out cross-validation [55] using Ug folds as follows. We choose a test seismogram 
Xt{t) among the Ug traces and compute the optimal set of weights (22) for the kernel ridge classifier (20) 
using the remaining rig — 1 traces. Patches Xi are then randomly selected from the test seismogram Xt{t) 
and the classifier computes the response function f{'^{xi)). The response of the classifier is compared to 
the true response r^ for various false alarms and missed detections levels. We repeat this procedure for each 
possible test seismogram Xt{t) among the n^ seismograms. Figure 10 details the cross-validation procedure. 
We quantify the performance of the classifier using a Receiver Operating Characteristic (ROC) curve [55]. 
The true detection rate (one minus the type II error) is plotted against the type I error (false alarm rate). 
We characterize each ROC curve by the area under the curve (the closer to one, the better) . 

5.3. Optimization of the parameters of the algorithm 

The optimal values of the parameters were computed using cross-validation. This procedure turned 
out to be very robust, since we used the same parameters for all experiments. The optimal classification 
performance was achieved by choosing a — co and i^jvAf = 32 in the construction of the graph Laplacian. 
This is equivalent to setting the weights Wi,j on the edges to be 1, and yields a graph that is extremely 
robust to noise. The influence of the patch size on the classification performance can be found in a series of 
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Algorithm 2: Cross validation of the classification 



Input: Seismic traces, and the associated responses (r^). 
Algorithm: 

for s = 1 to 3 // for each subset of seismic traces 

extract a total of N patches from n^ distinct seismic traces, 
compute new coordinates ^{xi) of each patch Xi i — 1, . . . ,N 

for j = 1 to Us II evaluate the classifier for each seismic trace j 

build classifier using all patches except those from trace j 
for all patches x^ in trace j compute classifier response /(^(a;^)) 
for e = ei to e„ 

// populate the ROC curve using different thresholds to detect an arrival 
if f{'if{xj)) > e and r^ < Eq then declare false positive 
else if f{^{xl)) < e and r^ > Eq then declare false negative 
end if 

record false positive and false negative rates for patches in fold j 
end for 
end for 
compute average false positive and false negative rate 

end for 

Output: area under the ROC curve. 

Figure 10: Cross validation procedure. 

ROC curves in Fig. 12. We observe in Fig. 12 that the STA/LTA algorithm performs best for seismograms 
with low energy localization. Indeed, waveforms with high energy localization contain very little energy in 
the baseline part of the waveform. As a result, the STA/LTA ratio is much larger for the primary waves than 
it is for the secondary waves, and STA/LTA often misses the secondary waves. For traces with low energy 
localization, the ratio between baseline and arrivals energies is much smaller. This causes the STA/LTA ratio 
to reach similar values for the primary and secondary arrivals. Figure 13 shows three seismic traces with 
different energy localizations. STA/LTA (magenta) misses the secondary wave for medium and high energy 
localizations. Our approach (green) can detect the primary and secondary waves at all energy localization 
levels. For seismic traces with low energy localization our approach cannot compete with STA/LTA when 
d drops below 256. Of course, it is unfair to compare our approach using patches of only 128 samples 
with STA/LTA, which uses 1080 time samples (27 seconds). Indeed, as soon as d > 1024, our approach 
outperforms STA/LTA. Interestingly, our approach does not benefit from using a much larger d; when 
d — 2048 the scale of the local analysis is no longer adapted to the physical process that we study. 

5.4- So what does the set of patches look like? 

To help us gain some insight into the geometric organization of patch space we display the patches using 
some of the new coordinates '^'(a;^). For the three subsets of patches (classified according to the energy 
localization), we display in Fig. 11 each patch as a dot using three coordinates of the coordinate vectors^. 
Because we can only display three coordinates, among the 25 (or 50) that we use to classify the patches, 
we use the three coordinates that best reveal the organization of patch-space. The color of the dot encodes 
the presence (orange) or absence (blue) of an arrival within Xi. As the energy localization increases the 
separation between baseline patches and arrival patches improves. This visual impression is confirmed using 
the quantitative evaluation performed with the ROC curves (see Fig. 12). Clearly the shape of the set of 
patches is not linear, and would not be well approximated with a linear subspace. 
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Figure 11: Each patch Xi is represented as a dot using three coordinates of the vector ^. The color encodes the presence 
(orange) or absence (blue) of an arrival within Xi. The energy localization levels increases from left to right. 



5.5. Classification Performance 

We first compare our approach to the gold-standard provided by STA/LTA. The second stage of the 
evaluation consists in quantifying the importance of the nonlinear dimension reduction 'S' defined by (17). 
To gauge the effect of "^ we replace it by two linear transforms: a wavelet transform and a PCA transform. 
In both cases, we reduce the dimensionality of each patch from d to m. Wavelets have been used for a long 
time in seismology because seismograms can be approximated with very high precision using a small number 
of wavelet coefficients (e.g. ["iX, 10, .^)f)] and references therein). On the other hand, we can also try to find 
the best linear approximation to a set of N patches. This linear approximation is obtained using PCA (also 
known as the singular-spectrum analysis [ ], see section 3.2). The first m vectors of a PCA analysis yields 
the subspace that provides the optimal TTi-dimensional approximation to the set of patches. Our experiments 
demonstrate that the set of patches is not a linear structure and therefore is poorly approximated using 
PCA. 

5.5.1. STA/LTA Ratio 

We implement an STA/LTA detector as follows. We first apply a passband ([0.8 - 3.5] Hz) Butterworth 
filter to the raw traces. We then compute the ratio of the energy over two adjacent time window: a short 
window of 120 time samples (2 s) immediately followed by a long window of 1080 time samples (27 s). An 
arrival is detected when this ratio exceeds a threshold. 

5.5.2. PCA and Wavelet Representations of Patch- Space 

An orthonormal wavelet transform (symmlet 8) provides a multiscale decomposition of each patch Xi 
in terms of d coefficients. Many of the coefficients are small and can be ignored. In order to decide which 
wavelet coefficients to retain, we select a fixed set of m/2 indices corresponding to the largest coefficients of 
the baseline patches. Similarly, we select the m/2 indices associated to the largest coefficients among the 
patches that contain arrivals. This procedure allows us to define a fixed set of m wavelet coefficients that are 
used for all patches as in input to the ridge regression algorithm. Similarly, we keep the first m coordinates 
returned by PCA. 

5.5.3. Parameters of the classifier based on the PCA and Wavelet Representations 

After applying a wavelet transform, or PCA, we use the same ridge classifier to detect arrivals. The 
parameters of the classifier are optimized for the wavelet and PCA transforms respectively. The Gaussian 
width a was again chosen to be a multiple of the average kernel distance between any two patches (see (23)). 
The parameter C in equation (23) was set to C = 6.9 for the wavelet parametrization, and C = 4.6 for the 
PCA parametrization. The ridge regression parameter was the same for both wavelets and PCA and was 
equal to /i = lO^'^. 
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Figure 12: ROC curves for various values of the embedding dimension d at three levels of energy localization. 
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Figure 13: Seismic trace Xi (blue); true response r^ (red); STA/LTA (magenta); classifier /(^(ai^)) (green). 



6. Discussion 

Table 1 provides a detailed summary of the performance of our approach. For each energy localization 
level (see section 5.2 for the definition of the subsets), we report the performance of the different detection 
methods as a function of d (patch dimension) and m (reduced dimension). The performance is quantified 
using the area under the ROC curve (the ROC curves are shown in Fig. 12); a perfect detector should have 
an area equal to one. 

Effect of the patch size. As expected, the patch-based methods perform poorly if the patch is too small 
(there is not enough information to detect the seismic wave) or too large (the information is smeared over 
too large a window). The choice of the optimal patch size is dictated by the physical processes at stake here, 
since the optimal size is the same for all methods, irrespective of the transform used to reduce dimensionality. 
For high energy localization seismograms, the seismic waves are very localized and therefore all algorithms 
perform better with smaller patches (256 or 512 instead of 1024). 

Effect of the transform used to reduce dimensionality. The experiments indicate that PCA outper- 
forms a wavelet decomposition at every energy localization level. Both PCA and the wavelet transform are 
orthonormal transforms that can be understood in terms of a rotation of patch-space. PCA provides the 
optimal rotation to align patch-space along the m-dimensional subspace of best-fit. Finally, the nonlinear 
transformation ^ based on the eigenfunctions of the Laplacian outperforms both PCA and wavelets. This 
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Table 1: Area under the ROC curve (closer to 1 is better) as a function of the patch dimension d and the reduced dimension 
m, at three different energy localization levels S. 

d STA/LTA Wavelet PCA Laplacian 

dimension 25 50 25 50 25 50 



Low S 


64 


- 


0.53 


0.53 


0.51 


0.55 


0.53 


0.53 




128 


- 


0.55 


0.49 


0.52 


0.51 


0.52 


0.52 




256 


- 


0.52 


0.47 


0.51 


0.54 


0.54 


0.57 




512 


- 


0.53 


0.50 


0.61 


0.61 


0.62 


0.64 




1024 


0.66 


0.43 


0.38 


0.54 


0.64 


0.70 


0.71 




2048 


- 


0.45 


0.39 


0.55 


0.48 


0.61 


0.62 


Mid 5 


64 


- 


0.54 


0.52 


0.52 


0.54 


0.57 


0.57 




128 


- 


0.57 


0.55 


0.53 


0.56 


0.66 


0.66 




256 


- 


0.61 


0.62 


0.70 


0.71 


0.71 


0.71 




512 


- 


0.68 


0.67 


0.76 


0.79 


0.79 


0.81 




1024 


0.68 


0.77 


0.76 


0.81 


0.84 


0.86 


0.86 




2048 


- 


0.64 


0.66 


0.69 


0.75 


0.80 


0.80 


HighS* 


64 


- 


0.56 


0.62 


0.65 


0.65 


0.72 


0.67 




128 


- 


0.68 


0.69 


0.78 


0.73 


0.80 


0.79 




256 


- 


0.72 


0.70 


0.77 


0.84 


0.88 


0.87 




512 


- 


0.74 


0.79 


0.73 


0.85 


0.90 


0.90 




1024 


0.59 


0.72 


0.76 


0.67 


0.75 


0.88 


0.89 




2048 


- 


0.51 


0.49 


0.57 


0.67 


0.76 


0.74 



clearly indicates that the set of patches contains nonlinear structures that cannot be well approximated 
by the optimal linear subspace computed by PCA. Interestingly, the results (not shown) are not improved 
by applying a wavelet transform before applying the nonlinear map '3/ (17) (see [wi] for an example of a 
combination of wavelet transform with a nonlinear map similar to 'S'). 

Dimension of patch-space. The performance is not significantly improved when 50 coordinates are used 
instead of 25. This is a result that is independent of the method used to reduced dimensionality, and is 
therefore a statement about the complexity of patch-space and about the physical nature of the seismic 
traces. As mentioned before, several studies have estimated the dimensionality of the low-dimensional 
inertial manifold reconstructed from the phase space of the tremors of a single volcano. This dimensionality 
was found in most studies [ ^, 2!), '44] to be less than five: a number much smaller than our rough estimate 
of the dimensionality of patch space. Because patch space includes several seismograms from different events 
measured at different stations, we expect the dimensionality of this set to be greater than the dimensionality 
of the phase space reconstructed from the tremors of a single volcano measured at a single station. On 
the other hand, our study confirms that the combined phase spaces associated with regional seismic waves 
remains remarkably low-dimensional. 

Computational complexity The complexity of this method is mainly determined by the combined com- 
plexity of the nearest neighbor search and the eigenvalue problem. We currently use the restarted Arnold! 
method for sparse matrices implemented by the Matlab function eigs to solve the eigenvalue problem. We 
use the fast approximate nearest neighbor algorithm implemented by the ANN library [f i 1 ] to construct the 
graph of patches. 

Parametrization of slow manifolds with the eigenvectors of the Laplacian. As discussed in Sec. 3.2, 
our approach is related to the problem of parametrizing the low dimensional attracting manifold associated 
with the nonlinear dynamical system that is at the origin of the seismic waves. Similar ideas have been 
proposed in the context of molecular dynamics. In [ ], the authors estimate slow variables that can be 
used to coarse-grain the dynamics near an inertial manifold. The slow variables are the eigenfunctions of a 
Fokker-Planck diffusion process defined on simulated data. Similarly, it was shown in [ , ] that invariant 
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subsets associated with a dynamical system can be identified by studying the eigenvectors of a probabihty 
transition matrix - similar to the row-stochastic matrix D ^W defined in (16) - in the phase space of the 
dynamical system. Horenko [ '>] proposes to reconstruct the phase space generated by time-delay embedding 
using a combination of K of m-dimensional subspaces. In contrast, we use a nonlinear parametrization in 
this paper. 

7. Concluding remarks 

In this paper we presented a novel method to estimate from a seismogram arrival-times of seismic waves. 
We use time-delay embedding to characterize the local dynamics over temporal patches extracted from 
the seismic waveforms. We combine several delay-coordinates phase spaces formed by the different patch 
trajectories, and construct a graph that quantifies the distances between the different temporal patches. 
The eigenvectors of the Laplacian defined on the graph provide a low-dimensional parametrization of the 
combined phase spaces. Finally, a kernel ridge regression learns the association between each configuration 
of the phase space and the presence of a seimic wave. The regression is performed using the low-dimensional 
parametrization of the set of patches. Our approach outperforms standard linear techniques, such as wavelets 
and singular-spectrum analysis, and makes it possible to capture the nonlinear structures of the phase space 
reconstructed from time-delay embedding. This method unites the existing theory on time-delay embedding 
and the recent results on the nonlinear parametrization of manifold- valued data [37, 38, 39, 40]. We expect 
that our approach may be applicable to time series that are generated by other complex nonlinear dynamical 
processes, such as neurophysiological data, financial data, etc. We also expect that the idea of reconstructing 
phase space using the eigenvectors of the graph Laplacian may be used to remove noise from data generated 
by nonlinear dynamical systems [()4, 65], and predict time-series [66, 67]. 
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